{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to transfer raw image data to TFRecords\n",
    "----\n",
    "\n",
    "Hello everyone! This tutorial, like the previous one, is focused on automatizing the data input pipeline.\n",
    "\n",
    "Most of the time, our datasets are too big to read in memory so we have to prepare a pipeline for reading the data in batches from hard disk. I always process my raw data (text, images, tabular) to TFRecords as it makes my life so much easier hehe :).\n",
    "\n",
    "### Tutorial flowchart\n",
    "----\n",
    "![img](tutorials_graphics/images2tfrecords.png)\n",
    "\n",
    "This tutorial will cover the following parts:\n",
    "* *create a function that reads raw images and transfers them to TFRecords.*\n",
    "* *create a function that parses the TFRecords to TF tensors.*\n",
    "\n",
    "So without any further due, let's get started."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import here useful libraries\n",
    "----"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import tensorflow.contrib.eager as tfe\n",
    "import glob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Enable eager mode. Once activated it cannot be reversed! Run just once.\n",
    "tfe.enable_eager_execution()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Transfer raw images to TFRecords\n",
    "----\n",
    "\n",
    "For this task, we will be using a few images from the FER2013 dataset, that you can find in the **datasets/dummy_images** folder. The emotion label can be found in the filename of the image.\n",
    "For example, picture **id7_3.jpg** has the label emotion **3**, which corresponds to the state **'Happy'** as you can see in the dictionary below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the meaning of each emotion index\n",
    "emotion_cat = {0:'Angry', 1:'Disgust', 2:'Fear', 3:'Happy', 4:'Sad', 5:'Surprise', 6:'Neutral'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def img2tfrecords(path_data='datasets/dummy_images/', image_format='jpeg'):\n",
    "    ''' Function to transfer raw images, along with their \n",
    "        target labels, to TFRecords.\n",
    "        Original source code for helper functions: https://goo.gl/jEhp2B\n",
    "        \n",
    "        Args:\n",
    "            path_data: the location of the raw images\n",
    "            image_format: the format of the raw images (e.g. 'png', 'jpeg')\n",
    "    '''\n",
    "    \n",
    "    def _int64_feature(value):\n",
    "        '''Helper function.'''\n",
    "        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))\n",
    "    \n",
    "    def _bytes_feature(value):\n",
    "        '''Helper function.'''\n",
    "        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))\n",
    "    \n",
    "    # Get the filename of each image within the directory\n",
    "    filenames = glob.glob(path_data + '*' + image_format)\n",
    "    \n",
    "    # Create a TFRecordWriter\n",
    "    writer = tf.python_io.TFRecordWriter(path_data + 'dummy.tfrecords')\n",
    "    \n",
    "    # Iterate through each image and write it to the TFrecords file.\n",
    "    for filename in filenames:\n",
    "        # Read raw image\n",
    "        img = tf.read_file(filename).numpy()\n",
    "        # Parse its label from the filename\n",
    "        label = int(filename.split('_')[-1].split('.')[0])\n",
    "        # Create an example (image, label)\n",
    "        example = tf.train.Example(features=tf.train.Features(feature={\n",
    "            'label': _int64_feature(label),\n",
    "            'image': _bytes_feature(img)}))\n",
    "        # Write serialized example to TFRecords\n",
    "        writer.write(example.SerializeToString())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transfer raw data to TFRecords\n",
    "img2tfrecords()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parse TFRecords to TF tensors\n",
    "----"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parser(record):\n",
    "    '''Function to parse a TFRecords example'''\n",
    "    \n",
    "    # Define here the features you would like to parse\n",
    "    features = {'image': tf.FixedLenFeature((), tf.string),\n",
    "                'label': tf.FixedLenFeature((), tf.int64)}\n",
    "    \n",
    "    # Parse example\n",
    "    parsed = tf.parse_single_example(record, features)\n",
    "\n",
    "    # Decode image \n",
    "    img = tf.image.decode_image(parsed['image'])\n",
    "   \n",
    "    return img, parsed['label']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want me to add anything to this tutorial, please let me know and I will be happy to further enhance it :)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
